Methods

Method & Data

The dashboard includes information about the articles (e.g., title, abstract) as well as on the authors, such as university of affiliation. I have obtained these data from PubMed using the PubMed API through the easyPubMed package. I have determined the country of the first author of each paper based on the affiliation address by matching the university name with a world university names database obtained from GitHub.

The full current list of journals can be obtained through pubmedDashboard::journal_field$journal_short[-23]:

 [1] "Developmental Psychology"                                  
 [2] "Journal of Personality and Social Psychology"              
 [3] "Journal of Abnormal Psychology"                            
 [4] "Journal of Family Psychology"                              
 [5] "Health Psychology"                                         
 [6] "Journal of Educational Psychology"                         
 [7] "Journal of Experimental Social Psychology"                 
 [8] "Collabra. Psychology"                                      
 [9] "Journal of Experimental Psychology. General"               
[10] "The Journal of Applied Psychology"                         
[11] "Psychological Methods"                                     
[12] "Advances in Methods and Practices in Psychological Science"
[13] "Psychological Science"                                     
[14] "Journal of Economic Psychology"                            
[15] "Journal of Experimental and Behavioral Economics"          
[16] "Experimental Economics"                                    
[17] "Journal of Development Economics"                          
[18] "World Development"                                         
[19] "Quarterly Journal of Economics"                            
[20] "Econometrica"                                              
[21] "Behavioral Public Policy"                                  
[22] "Nature Human Behaviour"                                    
[23] "Science"                                                   
[24] "Nature"                                                    

Note that PLOS One [the “23”] was excluded because its number of papers is too large to handle for this dashboard.

Missing Data & Next Steps

Missing data

Some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values. A table of data with missing countries is accessible at the Missing Data tab.

Next Steps

Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, (c) host DT tables on a server to speed up the website and allow the inclusion of a DT table for exploring the raw data, and (d) find a way to use country flags for the countries-by-journal figure.

Instructions

How to Use This Dashboard

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Continent

Column 1 {data-width=2150}

Waffle plot of journal paper percentages, by continent (each square = 1% of data)

Table of journal paper percentages, by continent

Continent, by Year

Column 1 {data-width=800}

Scatter plot of journal paper percentages, by continent and year

Table of journal paper percentages, by continent

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Continent, by Journal

Column 1 {data-width=700}

Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data)

Table of journal paper percentages, by continent and journal

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Country

Column 1 {data-width=800}

Waffle plot of journal paper percentages, by country (each flag = 1% of data)

Table of journal paper percentages, by country

* Percentages are calculated after excluding missing values. The Missing row shows the real percentage of missing values.

Country, by Year

Column 1 {data-width=1000}

Scatter plot of journal paper percentages, by country and year

Table of journal paper percentages, by country and year

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Country, by Journal

Column 1 {data-width=800}

Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data)

Table of journal paper percentages, by country and journal

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Psychology

Column 1 {data-width=800}

Scatter plot of journal paper percentages, by continent and year

Table of journal paper percentages, by continent

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Economics

Column 1 {data-width=800}

Scatter plot of journal paper percentages, by continent and year

Table of journal paper percentages, by continent

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

General

Column 1 {data-width=800}

Scatter plot of journal paper percentages, by continent and year

Table of journal paper percentages, by continent

* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.

Figure 1

Column 1

Figure 1, Proportion of American First Authors 1987-Today (Replication of Figure 1 in Arnett, 2008, and Thalmayer et al., 2021)

Missing Data

Column 1 {data-width=700}

This table allows investigating why the country/university could not be identified

Important Note

This data table is too large to display online in full

Initially, this dashboard included every publication in which the country and continent could not be identified. However, after adding large journals (Nature, Science, and PLOS One), this table became too large to display within the dashboard, making the webpage sluggish and slow to load. Therefore, only 1/10 is preserved for now, until we start fixing the missing data by identifying universities and therefore reducing the overall size of the table. This should help speed up using the dashboard in other areas.

---
title: "The Missing Majority Dashboard"
author: '<a href="https://remi-theriault.com" style="color: white">Rémi Thériault</a>'
output:
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
    social: menu
    source_code: embed
    # theme: lumen
    storyboard: false
    favicon: logo.ico
    css: style.css
---

<script>
   document.querySelector(".navbar-header").innerHTML =
            "<a href=\"#about\" class=\"navbar-brand navbar-inverse\">The Missing Majority Dashboard</a>";
</script> 

```{r setup, include=FALSE}
query_pubmed <- FALSE
```

```{r packages}
# Load packages
library(pubmedDashboard)
library(dplyr)
library(ggflags)

```

```{r API_TOKEN_PUBMED, eval=query_pubmed, include=FALSE}
if(Sys.info()["sysname"] == "Windows") {
  API_TOKEN_PUBMED <- keyring::key_get("pubmed", "rempsyc")  
}

check_pubmed_api_token(API_TOKEN_PUBMED)

```

```{r save_process_pubmed_batch, results='hide', eval=query_pubmed}
# We got a little problem with PLOS One, Science, and Nature,
# So we exclude them for now
save_process_pubmed_batch(
  journal = journal_field$journal_short[-23],
  # journal = tail(journal_field$journal, -3),
  year_low = 2023,
  year_high = 2030,
  api_key = API_TOKEN_PUBMED)

```

# About {.hidden}

## Row 1 {data-height=500}

### {data-width=1160}

[The Global South continues to be underrepresented in psychological research]{.big_center}

###

#### First authors in psychology and behavioural science...

```{r}
rate <- 56
flexdashboard::gauge(rate, min = 0, max = 100, symbol = '%', flexdashboard::gaugeSectors(
  success = c(0, 40), warning = c(40, 79), danger = c(80, 100)
))
```

#### From the US

```{r}
rate <- 87
flexdashboard::gauge(rate, min = 0, max = 100, symbol = '%', flexdashboard::gaugeSectors(
  success = c(0, 40), warning = c(40, 79), danger = c(80, 100)
))
```

#### From North America and Europe

## Row 2

### Representativity of First Authors in Psychology

A large proportion of first authors in psychology and behavioural science are located in North America or Europe, mostly in the US ([Thalmayer et al., 2021](https://psycnet.apa.org/doi/10.1037/amp0000622), [Arnett, 2008](https://doi.org/10.1037/0003-066x.63.7.602)). Thus, most of the world, and especially Africa and Latin America, are underrepresented, which could affect the validity and generalizability of psychological research. This dashboard presents some aggregated data by continent, country, year, and journal (for first authors only), to better document this trend over time and, possibly, inform future public policy on the matter.

If this matters to you, please [reach out](https://www.busara.global/partner-with-us/).

---

**How to cite this dashboard?**

Thériault, R., & Forscher, P. (2024). *The Missing Majority Dashboard*. https://remi-theriault.com/dashboards/busara

### Choice of Journals

The data from this report originally included information about publications from six psychology journals (*Developmental Psychology*, *Journal of Personality and Social Psychology*, *Journal of Abnormal Psychology*, *Journal of Family Psychology*, *Health Psychology*, and *Journal of Educational Psychology*), for years 1980 to 2023.

These journals were initially selected based on Arnett and colleagues' papers. The dashboard now includes many more psychology and behavioural science journals, which were selected through brainstorming from a group of experts. You can see the full list of journals on the [Methods] tab. If you think a journal should be there and it's not, please open a [Github issue](https://github.com/rempsyc/busara_dashboard/issues/) and we'll add it.

---

*This dashboard was created with the `pubmedDashboard` package in R: https://rempsyc.github.io/pubmedDashboard/.*

### About the Authors

**[Rémi Thériault](https://remi-theriault.com/)** is currently a PhD candidate in Psychology at the Université du Québec à Montréal, Canada. Overall, Rémi is passionate about putting social-psychological research to use to increase people’s well-being and intrinsic motivation to help one another. He also has a (tiny bit obsessive) passion for programming with R.

**[Patrick Forscher](https://busaracenter.org/patrick-s-forscher/)** is the primary collaborator on this project. He is currently the Research Lead at the Busara Center for Behavioral Economics, in Kenya, where he also leads the Culture, Research Ethics, and MEthods (CREME) project as director of the Meta-Research Team. The dashboard was Patrick's original vision, and frequently benefits from his creative input.

**[Busara](https://www.busara.global/)** is the dashboard's sponsor. Busara works with researchers and organizations to advance and apply behavioral science in pursuit of poverty alleviation. They use behavioral science to design solutions for partner organisations that are working to make lives better in the Global South.

# Methods

### Method & Data

The dashboard includes information about the articles (e.g., title, abstract) as well as on the authors, such as university of affiliation. I have obtained these data from PubMed using the PubMed API through the `easyPubMed` package. I have determined the country of the first author of each paper based on the affiliation address by matching the university name with a world university names database obtained from GitHub.

The full current list of journals can be obtained through `pubmedDashboard::journal_field$journal_short[-23]`:

```{r}
pubmedDashboard::journal_field$journal_short[-23]
```

Note that PLOS One [the "23"] was excluded because its number of papers is too large to handle for this dashboard.

### Missing Data & Next Steps

#### **Missing data**

Some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values. A table of data with missing countries is accessible at the [Missing Data] tab.

#### **Next Steps**

Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, (c) host DT tables on a server to speed up the website and allow the inclusion of a DT table for exploring the raw data, and (d) find a way to use country flags for the countries-by-journal figure.

# Instructions

**How to Use This Dashboard**

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.


# Continent {data-navmenu="Continent"}

## Column 1 {data-width=2150} {.tabset .tabset-fade}

### Waffle plot of journal paper percentages, by continent (each square = 1% of data) {data-height=600}

```{r get_historic_data}
articles.df4 <- read_bind_all_data()

# We filter for year 1987 because there are almost no publications before that
# And only include journals from the official list because it seems we
# are getting a lot of other journals
articles.df4 <- articles.df4 %>%
  filter(year >= 1987,
         journal %in% c(journal_field$journal))

# Note: the following journals are not available on PubMed (confirmed on the official search site):
# Behavioral public policy
# Journal of experimental and behavioral economics
# Quarterly journal of economics

```

```{r clean_journals_continents}
articles.df4 <- clean_journals_continents(articles.df4)

saveRDS(articles.df4, "data/fulldata.rds")

```

```{r continent_waffle_overall}
waffle_continent(articles.df4)

```

### Table of journal paper percentages, by continent {data-height=200}

```{r, continent_table}
table_continent(articles.df4)

```

# Continent, by Year {data-navmenu="Continent"}

## Column 1 {data-width=800} {.tabset .tabset-fade}

### Scatter plot of journal paper percentages, by continent and year {data-height=600}

```{r, continent_scatter_overall}
scatter_continent_year(articles.df4, method = "loess")

```

### Table of journal paper percentages, by continent {data-height=200}

```{r, continent_table_journal_year}
table_continent_year(articles.df4)

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Continent, by Journal {data-navmenu="Continent"}

## Column 1 {data-width=700} {.tabset .tabset-fade}

### Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data) {data-height=600}

```{r continent_table_journal_figure}
waffle_continent_journal(articles.df4)

```

### Table of journal paper percentages, by continent and journal {data-height=200}

```{r continent_table_journal}
table_continent_journal(articles.df4)

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Country {data-navmenu="Country"}

## Column 1 {data-width=800} {.tabset .tabset-fade}

### Waffle plot of journal paper percentages, by country (each flag = 1% of data)

```{r country_table_overall, fig.width=4.5, fig.height=4.5}
waffle_country(articles.df4)

```

### Table of journal paper percentages, by country {data-height=200}

```{r country_table_journal}
table_country(articles.df4)

```

> \* Percentages are calculated after excluding missing values. The *Missing* row shows the real percentage of missing values.

# Country, by Year {data-navmenu="Country"}

## Column 1 {data-width=1000} {.tabset .tabset-fade}

### Scatter plot of journal paper percentages, by country and year

```{r, country_series_year}
scatter_country_year(articles.df4, method = "lm")

# Include flags on scatter plot
# Note: doesn't work with ggplotly it seems
# library(ggflags)
# df.country.year %>%
#   mutate(year = as.numeric(year),
#          country = countrycode(country, "country.name", "genc2c"),
#          country = tolower(country),
#          country = as.factor(country)) %>%
#   nice_scatter(
#              predictor = "year",
#              response = "percentage",
#              group = "country",
#              colours = colors,
#              method = "lm",
#              groups.order = "decreasing",
#              ytitle = "% of All Papers") + 
#   geom_flag(aes(country = country)) +#%>%
#   scale_country(aes(country = country)) #%>%
  #ggplotly(tooltip = c("x", "y"))

# Time series dygraph 
# dygraph_year(articles.df4)
# dygraph_year(articles.df4, "country")

```

### Table of journal paper percentages, by country and year {data-height=200}

```{r, country_table_year}
table_country_year(articles.df4)

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Country, by Journal {data-navmenu="Country"}

## Column 1 {data-width=800} {.tabset .tabset-fade}

### Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data) {data-height=600}

```{r country_table_journal_figure}

waffle_country_journal(articles.df4)

```

### Table of journal paper percentages, by country and journal {data-height=200}

```{r country_table_journal2}
table_country_journal(articles.df4)

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.


# Psychology {data-navmenu="Field-Specific"}

## Column 1 {data-width=800} {.tabset .tabset-fade}

### Scatter plot of journal paper percentages, by continent and year {data-height=600}

```{r, continent_scatter_overall_psychology}
articles.df4 %>% 
  filter(field == "psychology") %>% 
  scatter_continent_year(method = "loess")

```

### Table of journal paper percentages, by continent {data-height=200}

```{r, continent_table_journal_year_psychology}
articles.df4 %>% 
  filter(field == "psychology") %>% 
  table_continent_year()

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Economics {data-navmenu="Field-Specific"}

## Column 1 {data-width=800} {.tabset .tabset-fade}

### Scatter plot of journal paper percentages, by continent and year {data-height=600}

```{r, continent_scatter_overall_economics}
articles.df4 %>% 
  filter(field == "economics") %>% 
  scatter_continent_year(method = "loess")

```

### Table of journal paper percentages, by continent {data-height=200}

```{r, continent_table_journal_year_economics}
articles.df4 %>% 
  filter(field == "economics") %>% 
  table_continent_year()

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# General {data-navmenu="Field-Specific"}

## Column 1 {data-width=800} {.tabset .tabset-fade}

### Scatter plot of journal paper percentages, by continent and year {data-height=600}

```{r, continent_scatter_overall_general}
articles.df4 %>% 
  filter(field == "general") %>% 
  scatter_continent_year(method = "loess")

```

### Table of journal paper percentages, by continent {data-height=200}

```{r, continent_table_journal_year_general}
articles.df4 %>% 
  filter(field == "general") %>% 
  table_continent_year()

```

> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.

# Figure 1 {data-navmenu="Other"}

## Column 1

### Figure 1, Proportion of American First Authors 1987-Today (Replication of Figure 1 in Arnett, 2008, and Thalmayer et al., 2021)

```{r, fig1}
scatter_figure1(articles.df4, original = TRUE)

```
  
# Missing Data {data-navmenu="Other"}

## Column 1 {data-width=700} {.tabset .tabset-fade}

### This table allows investigating why the country/university could not be identified

```{r missing_universities, warning=FALSE}
articles.df4 %>% 
  slice(1:(nrow(.) / 10)) %>% 
  table_missing_country()

```

### Important Note

**This data table is too large to display online in full**

Initially, this dashboard included every publication in which the country and continent could not be identified. However, after adding large journals (Nature, Science, and PLOS One), this table became too large to display within the dashboard, making the webpage sluggish and slow to load. Therefore, only 1/10 is preserved for now, until we start fixing the missing data by identifying universities and therefore reducing the overall size of the table. This should help speed up using the dashboard in other areas.